전이 학습과 어텐션(Attention)을 적용한 합성곱 신경망 기반의 음성 감정 인식 모델

이정현; 윤의녕; 조근식; Jung Hyun Lee; Ui Nyoung Yoon; Geun-Sik Jo

연구문헌

국내 논문지

홈 > 연구문헌 > 국내 논문지 > 한국정보과학회 논문지 > 정보과학회논문지 (Journal of KIISE)

정보과학회논문지 (Journal of KIISE)

Current Result Document :

한글제목(Korean Title)	전이 학습과 어텐션(Attention)을 적용한 합성곱 신경망 기반의 음성 감정 인식 모델
영문제목(English Title)	CNN-based Speech Emotion Recognition Model Applying Transfer Learning and Attention Mechanism
저자(Author)	이정현 윤의녕 조근식 Jung Hyun Lee Ui Nyoung Yoon Geun-Sik Jo
원문수록처(Citation)	VOL 47 NO. 07 PP. 0665 ~ 0673 (2020. 07)
한글내용 (Korean Abstract)	기존의 음성 기반 감정 인식 연구는 단일한 음성 특징값을 사용한 경우와 여러 가지 음성 특징값을 사용한 경우로 분류할 수 있다. 단일한 음성 특징값을 사용한 경우는 음성의 강도, 배음 구조, 음역 등 음성의 다양한 요소를 반영하기 어렵다는 문제가 있다. 여러 가지 음성 특징값을 사용한 경우에는 머신러닝 기반의 연구들이 다수를 차지하는데, 딥러닝 기반의 연구들에 비해 상대적으로 감정 인식 정확도가 낮다는 단점이 있다. 이러한 문제를 해결하기 위해 멜-스펙트로그램(Mel-Spectrogram)과 MFCC(Mel Frequency Cepstral Coefficient)를 음성 특징값으로 사용한 합성곱 신경망(Convolutional Neural Network) 기반의 음성 감정 인식 모델을 제안하였다. 제안하는 모델은 학습 속도 및 정확도 향상을 위해 전이학습과 어텐션(Attention)을 적용하였으며, 77.65%의 감정 인식 정확도를 달성하여 비교 대상들보다 높은 성능을 보였다.
영문내용 (English Abstract)	t Existing speech-based emotion recognition studies can be classified into the case of using a voice feature value and a variety of voice feature values. In the case of using a voice feature value, there is a problem that it is difficult to reflect the complex factors of the voice such as loudness, overtone structure, and range of voices. In the case of using various voice feature values, studies based on machine learning comprise a large number, and there is a disadvantage in that emotion recognition accuracy is relatively lower than that of deep learning-based studies. To resolve this problem, we propose a speech emotion recognition model based on a CNN(Convolutional Neural Network) using Mel-Spectrogram and Mel Frequency Cepstral Coefficient (MFCC) as voice feature values. The proposed model applied transfer learning and attention to improve learning speed and accuracy, and achieved 77.65% emotion recognition accuracy, showing higher performance than the comparison works.
키워드(Keyword)	음성 감정 인식 멜-스펙트로그램 MFCC 합성곱 신경망 전이학습 어텐션 speech emotion recognition Mel-Spectrogram CNN transfer learning attention
파일첨부	PDF 다운로드